From: Joey Hess Date: Mon, 15 Sep 2025 17:42:50 +0000 (-0400) Subject: drop problem end characters from filename operating on String not RawFilePath X-Git-Tag: archive/raspbian/10.20251029-1+rpi1~1^2~3^2~112 X-Git-Url: https://dgit.raspbian.org/%22http://www.example.com/cgi/%22/%22http:/www.example.com/cgi/%22?a=commitdiff_plain;h=11e7211d7b1fdf1db899c6199d3eddf2b9947b66;p=git-annex.git drop problem end characters from filename operating on String not RawFilePath Fix bug that could cause an invalid utf-8 sequence to be used in a temporary filename when the input filename was valid utf-8. Sponsored-by: k0ld --- diff --git a/CHANGELOG b/CHANGELOG index 371f53f30f..740253853a 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -3,6 +3,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * drop: --fast support when dropping from a remote. * Fix crash operating on filenames that are exactly 21 bytes long and begin with a utf-8 character. + * Fix bug that could cause an invalid utf-8 sequence to be used in a + temporary filename when the input filename was valid utf-8. * git-annex.cabal: Turn on the OsPath build flag by default. * Add build warnings when git-annex is built without the OsPath build flag. diff --git a/Utility/Tmp.hs b/Utility/Tmp.hs index 582f6849fc..df6673eadd 100644 --- a/Utility/Tmp.hs +++ b/Utility/Tmp.hs @@ -116,20 +116,29 @@ relatedTemplate' :: RawFilePath -> RawFilePath #ifndef mingw32_HOST_OS relatedTemplate' f | len > templateAddedLength = - {- Some filesystems like FAT have issues with filenames - - ending in ".", and others like VFAT don't allow a - - filename to end with trailing whitespace, so avoid - - truncating a filename to end that way. -} - let p = B.dropWhileEnd disallowed $ - truncateFilePath (len - templateAddedLength) f + let p = fixend $ truncateFilePath (len - templateAddedLength) f in if B.null p then "t" else p | otherwise = f where len = B.length f - disallowed c = c == dot || isSpace (chr (fromIntegral c)) + {- Some filesystems like FAT have issues with filenames + - ending in ".", and others like VFAT don't allow a + - filename to end with trailing whitespace, so avoid + - truncating a filename to end that way. -} + fixend p = + {- B.dropWhileEnd doesn't take wide characters + - into account, but is fast, so use it to check + - the common case. -} + let p' = B.dropWhileEnd disallowed p + in if p' == p + then p + else toRawFilePath $ reverse $ + dropWhile (disallowed . fromIntegral . ord) $ + reverse $ fromRawFilePath p dot = fromIntegral (ord '.') + disallowed c = c == dot || isSpace (chr (fromIntegral c)) #else -- Avoids a test suite failure on windows, reason unknown, but -- best to keep paths short on windows anyway. diff --git a/doc/bugs/multibyte_characters_broken.mdwn b/doc/bugs/multibyte_characters_broken.mdwn index ff868a0419..4b10d6419f 100644 --- a/doc/bugs/multibyte_characters_broken.mdwn +++ b/doc/bugs/multibyte_characters_broken.mdwn @@ -31,3 +31,5 @@ The original file obviously has a correct encoding, but it seems that git annex ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) I use git annex to manage my whole music collection successfully. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/multibyte_characters_broken/comment_1_eb421648f585296f7c44f969bdcae7a4._comment b/doc/bugs/multibyte_characters_broken/comment_1_eb421648f585296f7c44f969bdcae7a4._comment new file mode 100644 index 0000000000..8660462f46 --- /dev/null +++ b/doc/bugs/multibyte_characters_broken/comment_1_eb421648f585296f7c44f969bdcae7a4._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-15T16:59:59Z" + content=""" +git-annex actually attempts to truncate the filename taking unicode +character width into account. + +Here is the truncation on the wrong byte though: + + ghci> :t x + x :: String + ghci> x + "ingest-01-06 \19977\30707\29748\20035\12539\23500\27810\32654\26234\24693\12539\20037\24029\32190\12539\31712\21407\24693\32654\12539\28145\35211 \26792\21152 - Tuxedo Mirage.flac" + ghci> toRawFilePath x + "ingest-01-06 \228\184\137\231\159\179\231\144\180\228\185\131\227\131\187\229\175\140\230\178\162\231\190\142\230\153\186\230\129\181\227\131\187\228\185\133\229\183\157\231\182\190\227\131\187\231\175\160\229\142\159\230\129\181\231\190\142\227\131\187\230\183\177\232\166\139 \230\162\168\229\138\160 - Tuxedo Mirage.flac" + ghci> relatedTemplate (toRawFilePath x) + "ingest-01-06 \228\184\137\231\159\179\231\144\180\228\185\131\227\131\187\229\175\140\230\178\162\231\190\142\230\153\186\230\129\181\227\131\187\228\185\133\229\183\157\231\182\190\227\131\187\231\175\160\229\142\159\230\129\181\231\190\142\227\131\187\230\183\177\232\166\139 \230\162\168\229\138" + +What is going on is that '\160` is a space character, and filesystems like +FAT do not allow a filename to end with a space. So relatedTemplate trims +off trailing spaces, and accidentially trimmed off this byte, despite it +being part of a multibyte sequence. + +Aren't filesystems with arbitrary limitations on what valid filenames are fun? + +Fixed this. +"""]]